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^ . Abstract 

•/^ ■ There has been considerable recent interest in "cloud storage" wherein a user asks a server to 

I store a large file. One issue is whether the user can verify that the server is actually storing the 

file, and typically a challenge-response protocol is employed to convince the user that the file is 
\ indeed being stored correctly. The security of these schemes is phrased in terms of an extractor 

which will recover or retrieve the file given any "proving algorithm" that has a sufficiently high 
success probability. 

This paper treats proof-of-retrievability schemes in the model of unconditional security, where 
an adversary has unlimited computational power. In this case retrievability of the file can be 
■ modelled as error-correction in a certain code. We provide a general analytical framework for 

5h I such schemes that yields exact (non-asymptotic) reductions that precisely quantify conditions 

for extraction to succeed as a function of the success probability of a proving algorithm, and we 
apply this analysis to several archetypal schemes. In addition, we provide a new methodology for 
the analysis of keyed POR schemes in an unconditionally secure setting, and use it to prove the 
security of a modified version of a scheme due to Shacham and Waters under a slightly restricted 
attack model, thus providing the first example of a keyed POR scheme with unconditional 
security. We also show how classical statistical techniques can be used to evaluate whether the 
responses of the prover are accurate enough to permit successful extraction. Finally, we prove 
a new lower bound on storage and communication complexity of POR schemes. 



'D. Stinson's research is supported by NSERC discovery grant 203114-11 
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1 Introduction to Proof-of-retrievability Schemes 



In a proof-of-retrievability scheme (POR scheme) [31 EJ El [l2], a user asks a server to store a 
(possibly large) file. The file is divided into message blocks that we view as elements of a finite 
field. Typically, the file will be "encoded" using a code such as a Reed-Solomon code. The code 
provides redundancy, enabling erasures or corrupted message blocks to be corrected. 

In order for the user to be ensured that the file is being stored correctly on the server, a 
challenge-response protocol is periodically invoked by the user, wherein the server (the Prover) 
must give a correct response to a random challenge chosen by the user (the Verifier). This response 
will typically be a function of one or more message blocks. We do not assume that the user is storing 
the file. Therefore, in the basic version of a POR scheme, the user must precompute and store a 
sufficient number of challenge-response pairs, before transmitting the file to the server. After this is 
done, the user erases the file but retains the precomputed challenge-response pairs. Such schemes 
are termed bounded-use schemes in [6]. An alternative is to use a keyed (or unbounded-use) scheme, 
which permits an arbitrary number of challenges to be verified (i.e., the number of challenges does 
not need to be pre-determined by the user). We will discuss these a bit later. Finally, the user 
could retain a copy of the file if desired, in which case the responses do not need to be precomputed. 
This might be done if the server is just being used to store a "back-up" copy of the file. 

We wish to quantify the security afforded the user by engaging in the challenge-response protocol 
(here, "security" means that the user's file can be correctly retrieved by the user). The goal is that 
a server who can respond correctly to a large proportion of challenges somehow "knows" (or can 
compute) the contents of the file (i.e., all the message blocks). This is formalised through the 
notion of an extractor, which takes as input a description of a "proving algorithm" V for a certain 
unspecified file, and then outputs the file. Actually, for the schemes we study in this paper, the 
extractor only needs black-box access to the proving algorithm. The proving algorithm is created 
by the server, who is regarded as the adversary in this game. It should be the case that a proving 
algorithm that is correct with a probability that is sufficiently close to 1 will allow the extractor to 
determine the correct file. The probability that the proving algorithm V gives a correct response for 
a randomly chosen challenge is denoted by succ('P). We assume that V always gives some response, 
so it follows that V will give an incorrect response with probability 1 — succ(P). 

To summarise, we list the components in a POR scheme and the extractor-based security defi- 
nition we use. Note that we are employing standard models developed in the literature; for a more 
detailed discussion, see [3l ISl [T2]. 

• The Verifier has a message m S (F^)^ which he redundantly encodes as M G {^q)"'- 

• M is given to the Prover. In the case of a keyed scheme, the Prover may also be supplied with 
an additional tag, S. 

• The Verifier retains appropriate information to allow him to verify responses. This may or 
may not include a key K. 

• Some number of challenges and responses are carried out by the Prover and Verifier. In each 
round, the Verifier chooses a challenge c and gives it to the Prover, and the Prover computes 
a response r which is returned to the Verifier. The Verifier then verifies if the response is 
correct. 

• The computations of the Prover are described in a proving algorithm V. 
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• The success probability of V is the probabiUty that it gives a correct response when the 
challenge is chosen randomly. 

• The Extractor is given V and (in the case of a keyed scheme) and outputs an unencoded 
message rh. Extraction succeeds \i rh = m. 

• The security of the POR scheme is quantified by proving a statement of the form "the Extractor 
succeeds with probability at least 5 whenever the success probability of V is at least e" . In this 
paper, we only consider schemes where 5 = 1, that is, where extraction is always successful. 

1.1 Previous Related Work 

Blum et al [2] introduced the concept of memory checking. They formalized a model where per- 
forming any sequence of store and request operations on a remote server behaves similarly to local 
storage. Two seminal papers important to the development of POR schemes are [9l[Tl]. Lillibridge 
et al. [9] first considered the problem of creating a backup of a large file by redundantly encoding 
the using an erasure code and distributing pieces of the file to one or more servers. Naor and 
Rothblum studied memory checkers which use message authentication techniques to verify if a 
file is stored correctly on a remote server. They also consider authenticators, which allow a verifier 
to interact with the server and reconstruct the file provided that a sufficient number of "audits" 
are all correct. 

The concept oi proof- of-retrievability is due to Juels and Kaliski [8]. A POR scheme incorporates 
a challenge-response protocol in which a verifier can check that a file is being stored correctly, along 
with an extractor that will actually reconstruct the file, given the algorithm of a "prover" who is 
able to correctly respond to a sufficiently high percentage of challenges. 

There are also papers that describe the closely related (but slightly weaker) idea of a proof-of- 
data-possession scheme (PDP scheme), e.g., [l]. A PDP scheme permits the possibility that not all 
of the message blocks can be reconstructed. Atieniese et al. [1] also introduced the idea of using 
homomorphic authenticators to reduce the communication complexity of the system. Shacham and 
Waters |12] showed that the scheme of Ateniese et al. can be transformed in to a POR scheme by 
constructing an extractor that extracts the file from the responses of the server on audits. 

Bowers, Juels, and Oprea [3] used error-correcting codes, in particular the idea of an "outer" 
and an "inner" code (in much the same vein as concatenated codes), to get a good balance between 
the server storage overhead and computational overhead in responding to the audits. A paper that 
discusses a coding-theoretic approach in the setting of storage enforcement is [7]; this paper makes 
use of list-decoding techniques, but it concentrates on the storage requirements of the server. 

Dodis, Vadhan and Wichs [6] provide the first example of an unconditionally secure POR 
scheme, also constructed from an error-correcting code, with extraction performed through list 
decoding in conjunction with the use of an almost-universal hash function. We discuss this partic- 
ular paper further in Section 11.31 

1.2 Our Contributions 

In this paper, we treat the general construction of extractors for POR schemes, with the aim of 
establishing the precise conditions under which extraction is possible in the setting of unconditional 
security, where the adversary is assumed to have unlimited computational capabilities. In this 
setting, it turns out that extraction can be interpreted naturally as nearest-neighbour decoding in 



3 



a certain code (which we term a "response code"). Previously, error-correcting codes have been used 
in specific constructions of POR schemes; here, we propose that error-correcting codes constitute 
the natural foundation to construct as well as analyse arbitrary POR schemes. 

There are several advantages of studying unconditionally secure POR schemes. First, the 
schemes are easier to understand and analyse because we are not making use of any additional 
cryptographic primitives or unproven assumptions (e.g., PRFs, signatures, bilinear pairings, MACS, 
hitting samplers, random oracle model, etc.). This allows us to give very simple exact analyses of 
various schemes. Secondly, the essential role of error-correcting codes in the design and analysis 
of POR schemes becomes clear: codes are not just a method of constructing POR schemes; rather, 
every POR scheme gives rise to a code in a natural way. 

The success of the extraction process usually depends on the distance of the code used to initially 
encode the file; when the distance of this code is increased, the extraction process will be successful 
for less successful provers, thus increasing the security afforded the user. As we mentioned earlier, 
we quantify the security of a POR scheme by specifying a value e and proving that the extraction 
process will always be able to extract the file, given a prover V with success probability succ(P) > e. 
(In some other papers, weaker types of extractors are studied, such as extractors that succeed with 
some specified probability less than 1, or extractors that only recover some specified fraction of the 
original data.) This allows us to derive conditions which guarantee that extraction will succeed, 
and to compute exact (or tight) bounds, as opposed to the mainly asymptotic bounds appearing 
in [6]. 

We exemplify our approach by considering several archetypal POR schemes. We consider keyed 
schemes as well as keyless schemes. In Section [21 we first consider the fundamental case where the 
server is just required to return one or more requested message blocks. Then we progress to a more 
general scheme where the server must compute a specified linear combination of certain message 
blocks. Both of these are keyless schemes. 

In Section O we investigate the Shacham- Waters scheme [12], which is a keyed scheme, modified 
appropriately to fit the setting of unconditional security. For this scheme, we note that uncondi- 
tional security can be achieved only if the prover does not have access to a verification oracle. It 
is also necessary to analyse the success probability of a proving algorithm in the average case, over 
the set of keys that are consistent with the information given to the prover. This new analytical 
approach is the first to allow the construction of a secure keyed POR scheme in the unconditionally 
secure setting. 

In Section m we look more closely at the numerical conditions we have derived for the various 
schemes we studied and we provide some useful comparisons and estimates. 

We desire that successful extraction can be accomplished whenever succ('P) exceeds some pre- 
specified threshold. But this raises the question as to how the user is able to determine (or estimate) 
succ(P). In many practical POR schemes, the only interaction a user has with the server is through 
the challenge-response protocol. We show in Section [5] that classical statistical techniques can be 
used to provide a systematic basis for evaluating whether the responses of the prover are accurate 
enough to permit successful extraction. 

The main overhead of keyed (unbounded-use) unconditionally secure schemes (relative to com- 
putationally secure schemes) is in the storage requirements of the user. While this may be prob- 
lematic in some circumstances, there are significant applications, such as remote backup services, 
for which such schemes do genuinely represent a practical solution. We also show in Section [6] that 
a significant additional storage requirement cannot be avoided in this setting, by proving a new 
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information-theoretic lower bound on storage and communication requirements of POR schemes. 
Our new bound is an improvement of the information-theoretic lower bound for memory checkers 
and authenticators proven in 

We note that our goal is not to identify a single "best" POR scheme, but rather to provide a 
useful new methodology to analyse the exact security of various POR schemes in the unconditionally 
secure setting. The POR problem is one that lends itself naturally to analysis in the unconditionally 
secure model; the schemes we consider are natural examples of POR protocols that do not require 
cryptographic building blocks depending on computational assumptions. 

For reference, we provide a list of notation used in this paper in Table [3] in the Appendix. 

1.3 Comparison with Dodis, Vadhan and Wichs [6] 

Our work is most closely related to that of Dodis, Vadhan and Wichs [6]. In [6], it is stated that 
"there is a clear relation between our problem and the erasure/error decoding of error-correcting 
codes". Our paper is in some sense a general exploration of these relations, whereas [6] is mainly 
devoted to a specific construction for a POR scheme that is based on Reed-Solomon codes. 
We can highlight several differences between our approach and that of [6]: 

• In the setting of unconditional security, the paper [6] only provides bounded-use schemes. 
Our scheme is the first unbounded-use scheme in this setting. 

• The paper [6] mainly uses a (Reed-Solomon) code to construct a specific POR scheme. In 
contrast, we are studying the connections between an arbitrary POR scheme and the distance 
of the (related) code that describes the behaviour of the scheme on the possible queries that 
can be made to the scheme. Stated another way, our approach is to derive a code from a 
POR scheme, and then to prove security properties of the POR scheme as a consequence of 
properties of this code. 

• The paper uses various tools and algorithms to construct their POR schemes, including 
Reed-Solomon codes, list decoding, almost-universal hash families, and hitting samplers based 
on expander graphs. We just use an error-correcting code in our analyses. 

• We base our analyses on nearest-neighbour decoding (rather than list decoding, which was 
used in [6]), and we present conditions under which extraction will succeed with probability 
equal to 1 (in [6], extraction succeeds with probability close to 1, depending in part on 
properties of a certain class of hash functions used in the protocol). 

• The "POR codes" in [6] are actually protocols that consist of challenges and responses in- 
volving a prespecified number of message blocks; we allow challenges in which the responses 
depend on an arbitrary number of message blocks. 

• All our analyses are exact and concrete, whereas the analyses in ^ are asymptotic. 

2 Analysis of Several Keyless Schemes 
2.1 A Basic Scheme 

As a "warm-up", we illustrate our coding theory approach by analysing a simple POR scheme, 
which we call the Basic Scheme. From now on, we usually refer to the user as the Verifier and the 
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Figure 1: Basic Scheme 



initialisation 

Given a message m £ A4, encode M as e(m) = M G A4*. The message encoding 
function e : — t- 7W* is a public bijection. We will assume that M = (Fq)*^ and 
^ (Fg)", where g is a prime power and n > k. We write M = (mi, . . . ,mn), 
where the components mi, . . . , m„ are termed message blocks. 

The Verifier gives the encoded message M to the Prover. The Verifier also generates 
a random challenge c € {1, . . . , n} and stores c and the message block ruc- 

challenge-response 

The Verifier gives the challenge c to the Prover. The Prover responds with the message 
block r = ijic- The Verifier checks that the response r returned by the Prover matches 
the stored value rUc- 



server as the Prover. The basic scheme is presented in Figure [TJ 

Here is the adversarial model we use to analyse the security of a keyless POR scheme: 

1. The Adversary is given the set of encoded messages M.* . 

2. The Adversary selects m € 7W. 

3. The Adversary outputs a deterministic^ proving algorithm for m, denoted V. 
The success probability of V is defined to be 

succ(P) = Pr[P(c) = mc], 

where e(m) = (mi, . . . , mn) and this probability is computed over a challenge c G {1, . . . , n} chosen 
uniformly at random. 

Now we wish to construct an Extractor that will take as input a proving algorithm V for some 
unknown message m. The Extractor will output a message fh £ M. We say that the POR scheme 
is secure provided that fh = m whenever the success probability of V is sufficiently close to 1. 

The Extractor for the Basic Scheme is presented in Figure [D 

Theorem 2.1. Suppose that V is a proving algorithm for the Basic Scheme for which succ('P) > 
l — d/{2n), where the hamming distance of the set of encoded messages Ai* is d. Then the Extractor 
presented in Figure\^ will always output fh = m. 

Proof. Let M' be the n-tuple of responses computed by V and denote 5 = dist(M, M'), where 
M = e(m). Denote e = succ(7^). Then it is easy to see that e = 1 — 5/n. We want to prove that 
M = M. We have that M is a vector in A4* closest to M' . Since M is a vector in A4* such that 
dist(M, M') = 6, it must be the case that dist(M,M') < 6. By the triangle inequality, we get 

dist(M, M) < dist(M, M') + dist(M, M') < 5 + 6 = 26. 

^We note that, without loss of generahty, the proving algorithm can be assumed to be deterministic. This follows 
from the observation that any probabilistic proving algorithm can be replaced by a deterministic algorithm relative 
to which the success of the extractor defined in Figure [2] will not be increased. 
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Figure 2: Extractor for the Basic Scheme 



1. On input V, compute the vector M' = {m'^, . . . ,m'^), where = V{c) for all 
c G {1, . . . , n} (i.e., m'^ is the response computed by V when it is given the challenge 
c). 

2. Find M S A4* so that dist(M', M) is minimised, where dist(-, •) denotes the hamming 
distance between two vectors. 

3. Output m = e-^(M). 



However, 



26 = 2n(l - e) <2n{ ^] = d. 



Since M and M are vectors in A^* within distance d, it follows that M = M and the Extractor 
outputs m = e~^(M), as desired. □ 



2.2 General Keyless Challenge-Response Schemes 

In the simple protocol we analysed, a response to a challenge was just a particular message block 
chosen from an encoded message M. We are interested in studying more complicated protocols. 
For example, we might consider a response that consists of several message blocks from M, or is 
computed as a function of one or more message blocks. In this section, we generalise the preceding 
extraction process to handle arbitrary keyless challenge- response protocols. 

In general, a challenge will be chosen from a specified challenge space F, and the response will 
be an element of a response space A. The response function p : M* x F — >■ A computes the response 
r = p{M, c) given the encoded message M and the challenge c. 

For an encoded message M S A^*, we define the response vector 

r*^ = (p(M,c) : cG F). 

That is, r*'^ contains all the responses to all possible challenges for the encoded message M. Finally, 
define the response code (or more simply, the code) of the scheme to be 

7^* = {r^^ : M G M*}. 

The codewords in TZ* are just the response vectors that we defined above. 

The Generalised Scheme is presented in Figure [3l Observe that TZ* C A''', where 7 = |F|. We 
will assume that the mapping M 1— )• r^"^ is an injection, and therefore the hamming distance of TZ* 
is greater than (in fact, we make this assumption for all the schemes we consider in this paper). 

We now describe the generalised adversarial model. 



1. The Adversary is given the response code TZ* . 

2. The Adversary selects m G 7W. 
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Figure 3: Generalised Scheme 



initialisation 

Given a message m € encode m as e{m) = M G AA* . The Verifier gives M to 
the Prover. The Verifier also generates a random challenge c G T and stores c and 
p(M,c). 

challenge-response 

The Verifier gives the challenge c to the Prover. The Prover responds with r = 
p{M,c). The Verifier checks that the value r returned by the Prover matches the 
stored value p{M,c). 



Figure 4: Extractor for the Generalised Scheme 



1. On input V, compute the vector R' = {r'^ : c ^ P), where = 'P(c) for all c G F 
(i.e., for every c, is the response computed by V when it is given the challenge c). 

2. Find M G M* so that dist(i2', r^^) is minimised. 

3. Output fh = e-^(M). 



3. The Adversary outputs a deterministic proving algorithm for m, denoted V. 

The success probability of V is defined to be 

succ(P) = Pr[P(c) = p{M,c)], 

where M = e(m) and this probability is computed over a challenge c G F chosen uniformly at 
random. 

Now we construct an Extractor that will take as input a proving algorithm V for some unknown 
message m. The Extractor will output a message fh G Ai. We say that the PGR scheme is secure 
provided that fh = m whenever the success probability of V is sufficiently close to 1. The Extractor 
for the Generalised Scheme is presented in Figure HI 

The following theorem relates the success probability of the extractor to the hamming distance 
of the response code. Its proof is essentially identical to the proof of Theorem 12.11 

Theorem 2.2. Suppose that V is a proving algorithm for the Generalised Scheme for which succ(P) > 
1 — d*/{2j), where the hamming distance of the response code IZ* is d* > 0. Then the Extractor 
presented in Figure\^ will always output fh = m. 

Proof. Let R' be the 7-tuple of responses computed by V and denote 6 = dist(r^'^, i?'), where 
M = e{m). Denote e = succ('P). Then it is easy to see that e = 1 — 5/j. We want to prove that 
M = M. We have that r^^ is a codeword in TZ* closest to R' . Since M is a codeword such that 
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Figure 5: Multiblock Challenge Scheme 



challenge 




A challenge is a subset of ^ indices J C {1, . . . 


71 }. Therefore, F = {.J C 


{l,...,n},|J| =£} and7= (^). 




response 




Given the challenge J = {ii,...,^^} where 1 < ii 


< ■ ■ ■ < ie < n, the correct 


response is the ^-tuple 




p{M, J) = {rrii-^ ,...,mii. 


)• 


Suppose the Verifier receives a response (ri, . . . ,r^). 


He then checks that rj = mi. 


for i<j<e. 




In this scheme, we have A = (F^)^. 





dist(r*^, R') = 5, it must be the case that dist(r^, R') < 5. By the triangle inequality, we get 
dist(r*^, r^) < dist(r^^ R') + dist(r^, R') < 6 + 6 = 26. 

However, 

25 = 27(1 -e) <2-f(^] =d*. 



27 

Since r*'^ and r*^ are codewords within distance d*, it follows that M = M and the Extractor 
outputs m = e~^(M), as desired. □ 

Observe that the efficacy of this extraction process depends on the relative distance of the 
response code TZ*, which equals d* /j. In the next subsections, we look at some specific examples 
of challenge-response protocols. 

2.3 Multiblock Challenge Scheme 

We present a POR scheme that we term the Multiblock Challenge Scheme in Figure [5] (this is the 
same as the "Basic PoR Code Construction" from [6]). 

Lemma 2.3. Suppose that the hamming distance of Ad* is d. Then the hamming distance of the 
response code of the Multiblock Challenge Scheme is d* = (") — ("7'^) • 

Proof Suppose that M, M' € M* and M / M' . Denote dist(M, M') = 6, M = {mi, . . . , m„) and 
M' = {m'l, . . . , m^). It is easy to see that rj^ = r^ if and only if J C {i : mi = m'j}. From this, it 
is immediate that dist(r^,r^') = (") — ("7''). The desired result follows because 5 > d. □ 

Theorem 2.4. Suppose that V is a proving algorithm for the Multiblock Challenge Scheme for which 
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Figure 6: Linear Combination Scheme 



challenge 

A challenge is an n-tuple V = {vi, . . . , Vn) € (F^)". 
response 

Given the challenge V = {vi, . . . , Vn), the correct response is 

n 

p{M,V) = V -M = Y,v^mu 
1=1 

where M = (mi,...,m„) and the computation is performed in Fg. Suppose the 
Verifier receives a response r G Fg. He then checks that r = V ■ M. 

In this scheme, A = Fg. 



where the hamming distance of M* is d. Then the Extractor presented in Figure [7] will always 
output in = m. 

Proof. This is an immediate consequence of Theorem 12.21 and Lemma 12.31 once we verify that 

□ 

Remark: When we set ^ = 1 in Corollary 12.41 we obtain Theorem 12.11 
2.4 Linear Combination Scheme 

In this subsection, we consider the Linear Combination Scheme, in which a response consists of 
a specified linear combination of message blocks (this could perhaps be thought of as a keyless 
analogue of the Shacham- Waters scheme [12] ) . The scheme is presented in Figure [6l We will study 
two versions of the scheme: 

Version 1 

Here, the challenge V is any non-zero vector in (Fg)"", so 7 = — 1. 
Version 2 

In this version of the scheme, the challenge V is a vector in (Fg)" having hamming weight 
equal to so 7 = (") {q — 1)^. 



2.4.1 Analysis of Version 1 

Lemma 2.5. Suppose that the hamming distance of M* is d. The hamming distance of the response 
code of version 1 of the Linear Combination Scheme is d* = — — 1. 
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Proof. Suppose M, M' G M* and M ^ M'. Then V ■ M = V ■ M' if and only if V ■ {M - M') = 0. 
Since M ^ M', there are q^^"^ solutions for V , given M and M' . There are — 1 choices for V , 
so the desired result follows. □ 

We observe that d* is independent of d (the hamming distance of Ai*) in version 1 of the 
scheme. 

Theorem 2.6. Suppose that V is a proving algorithm for version 1 of the Linear Combination 
Scheme for which 

Then the Extractor presented in Figure\^ will always output fh = m. 
2.4.2 Analysis of Version 2 

Lemma 2.7. Suppose that r > 1 and X € {^qY has hamming weight equal to r. Then the number 
of solutions V S (Fg)*" to the equation V ■ X = in which V has hamming weight equal to r, which 
we denote by a,., is given by the formula 

ar = ^{{q-lY-'-{-iy-'). (1) 

Proof. We prove the result by induction on r. When r = 1, there are no solutions, so oi = 0, 
agreeing with ([T|). Now assume that ([1]) gives the number of solutions for r = s — 1, and consider 
r = s. Let X = (xi, . . . , Xs) and define X' = (xi, . . . , Xs-i). By induction, the number of solutions 
to the equation V' ■ X' = in which V has hamming weight s — 1 is a^-i. Each of these solutions 
V' can be extended to a solution of the equation V • X = hy setting Vs = 0; in each case, the 
resulting V has hamming weight equal to s — 1. However, any other vector V of weight s — 1 can be 
extended to a solution of the equation V ■ X = which has hamming weight equal to s. Therefore, 
we have 

as = {q-iy~^-as-i 

= (9-ir'-f^((9-ir'-(-ir') 
= (^-ir^(i-^) + ^(-ir' 

Q-l 



□ 



Remark. An alternative way to prove ([T]) is to observe that is equal to the number of codewords 
of weight r in an MDS code having length r, dimension r — 1 and distance 2. Then ([1]) can be 
derived from well-known formulas for the weight distribution of an MDS code. For example, in 
[10\ Ch. 6, Theorem 6], it is shown that the number of codewords of weight w in an MDS code of 
length n, dimension k and distance d = n — k + 1 over ¥q is 

w—d ^ ^ 

-d-1 



(:)(.->.E<-K-:> 



11 



If we substitute n = w = r, d = 2 into this formula, we get 



r-2 



^ j=0 \ J / 

e('';')(-i)v-'-'-(-i)' 



j=0 

1 /'■-^ / 1^ 

^ [S^C' ^r-l-j I i\r-l 
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which agrees with ([1]), as proven in Lemma |2.7[ 

We will now compute the distance d* of the response code . Here is a lemma that will be 
of use in computing d* . 

Lemma 2.8. Suppose that M,M' G M* and M / M' . Denote 6 = dist(M,M'). Let r^^ and r*^' 
be the corresponding vectors in the response code of Version 2 of the Linear Combination Scheme. 
Then 



W>1 



where the 's are given by (CP- 

Proof Suppose that M, M' e M* and M / M'. Denote 

M = (mi , . . . , m„) and M' = (m'l , . . . , m'^), 

and let 6 = dist(M, M'). Let 

J = {i : ijii = m'j} and J' = {1, . . . , n} \ J. 

Observe that | J| = n — (5 and \ J'\ = 5. For any V = {vi, . . . , Vn) having hamming weight equal to 
£, define 

jy = {j(zj. y. ^ 0} and Jy = {j G J' : Vj / 0}. 

Denote w = | Jy|; then \Jv\ = i — w. 

Suppose w > 1. Then, given Jy and Jy, the number of solutions to the equation V ■ M = V ■ M' 
is precisely {q — l)^~"'a^. When w = 0, the number of solutions is {q — 1)^. Summing over w, and 
considering all possible choices for Jy and Jy, we see that the total number of solutions to the 
equation V ■ M = V ■ M' is 



+!:(!)(;::)<'-)'-'"- 



The desired result follows. □ 
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We can obtain a very accurate estimate for d* by observing that 

q 

is a very accurate approximation. After making this substitution, it is easy to see that the quantity 
([3]) is minimised when 5 = d. This minimises ([2]), so we obtain 



We have 



<'-(';)(.-)'-("r)"-')'-E(:)(;::)^. 



/d\fn-d\{j-Vf_ ^ {q-lY /d\/n-d 



(4) 



{q — lY ( (n\ fn — d 



Therefore, from we get 



d*^{q- 1) 

{q — lY~^^ ( (''^\ (n — d 



q \V 



n\ f n — d\\ {q — f f n\ /n — d 

i 



(5) 



The fohowing theorem uses the estimated value for d* derived in ([5]). 



Theorem 2.9. Suppose that V is a proving algorithm for version 2 of the Linear Combination 
Scheme for which 

where the hamming distance of M* is d. Then the Extractor presented in Figure [7] will always 
output m = m. 

3 Analysis of a Keyed Scheme: the Shacham- Waters Scheme 

The Shacham-Waters Scheme [12] is a keyed proof- of-retrievability scheme. This means that the 
Verifier has a secret key that is not provided to the Prover. This key is used to verify responses in 
the chahenge-response protocol, and it is also provided to an extraction algorithm as input. The 
use of a key permits an arbitrary number of challenges to be verified, without the Verifier having 
to precompute the responses. 

We discuss a variation of the Shacham-Waters Scheme [12], modified to fit the unconditional 
security setting. The main change is that the vector . . . , /3„) (which comprises part of the key) 
is completely random, rather than being generated by a pseudorandom function (i.e., a PRF). This 
scheme, presented in Figure [71 is termed the Modified Shacham-Waters Scheme. 

As we did with the Linear Combination Scheme, we will study two versions of the scheme. In the 
first version, the challenge V is any non-zero vector in (Fg)", so 7 = — 1. In the second version, 
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Figure 7: Modified Shacham-Waters Scheme 



• 


The key K consists of a e Fg and = . . . , G (F^)" 


K is retained by the 




Verifier. 




• 


The encoded message is M = {ttii, ■ ■ ■ , w-n) £ i^g^"^- 




• 


The tag is 5 = (di, . . . , it„) E (Fg)", where S" is computed usin 


g the following (vector) 




equation in F^: 






b = n + aM . 


(6) 




The message M and the tag S are given to the P rover. 




• 


A challenge is a vector y = {vi, . . . ,Vn) S (Fg)". 




• 


The response consists of (/u, r) € (Fg)^, where the foUowing 


computations are per- 




formed in F^: 






fi = V -M 


(7) 




and 






T = V -S. 


(8) 


• 


The response (/U, r) is verified by checking that the following 


condition holds in F^: 




T = afi + V ■ B. 


(9) 



the challenge F is a vector in (F^)" having hamming weight equal to so 7 = In both 

versions, A = (Fg)^. 

The information held by the various parties in the scheme is summarised as follows: 



Verifier 


P rover 


Extractor 


K = {a,B) 


M,S 


K,V 



We first observe that, from the point of view of the Prover, there are q possible keys. 

Lemma 3.1. Given M and S, the Prover can restrict the set of possible keys {a,B) to 

Possible(M, S) = {(qo, S - aoM) : ao G FJ. 

Proof. Suppose that a = oq. Then equation ([6]) implies that B = S — a^M. □ 

We will define a response (/U, r) to be acceptable if ([9]) is satisfied. A response is authentic if it 
was created using equations d?]) and Note that an authentic response will be acceptable for 
every key K G Possible(Af, S). In the case of an acceptable (but perhaps not authentic) response, 
we have the following useful lemma. 

Lemma 3.2. Suppose that a response {i-l,t) to a challenge V for a message M is acceptable for 
more than one key in Possible(A/, S). Then {fi,T) is authentic. 
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Proof. Suppose Ki,K2 € Possible(Af, 5), where Ki = (ai,i?i), K2 = {02, B2) and ai ^ 02- We 
have Bi = S — aiM and B2 = S — a2M. Now consider a response (/i,r) to a challenge V that is 
acceptable for both of the keys Ki and K2- Then 



T = aifi + V ■ Bi = 02/^ + V ■ B2. 




{Bi - B2) = 0. 



(ai - a2){ii 



V ■ M) = 0. 



We have ai 7^ 02, so it follows that ^ 



V ■ M. Then we obtain 



r 



aiF -M + F- (5-aiM) 
V -S. 



Therefore the response (/x, r) is authentic. 



□ 



The verification condition ([9|) depends on the key, but not on the message M. It follows that 
the Prover can create acceptable non-authentic responses if he knows the value of a (which in turn 
uniquely determines the key, as shown in Lemma l3.ip . This leads to the following attack. 

Theorem 3.3. // the Prover has access to a verification oracle, then the Modified Shacham-Waters 
Scheme is not unconditionally secure. 

Proof. For every key K € Possible(M, S*), the Prover can create a response (/u,r) to a challenge 
V that will be acceptable if and only if K is the actual key (this follows from Lemma 13. 2p . The 
Prover can check the validity of these responses by accessing the verification oracle. As soon as 
one of these responses is accepted by the verification oracle, the Prover knows the correct value of 
the key. Hence the Prover can now create a proving algorithm V that will output acceptable but 
non-authentic responses. This algorithm V will not allow the correct message to be extracted. 

In more detail, after the Prover has determined the key K = {a,B), he chooses an arbitrary 
(encoded) message M' 7^ M and constructs V as follows: 

1. Given a challenge V, define fi = V ■ M'. 

2. Then define r = o/i + V-B. 

Suppose the Extractor is run on V. It is easy to see that d\st{R',r^') = 0, so the Extractor will 
compute M = M', which is incorrect. □ 

Even in the absence of a verification oracle, the Prover can guess the correct value of a, which he 
can do successfully with probability 1/q. If he correctly guesses a, he can create a non-extractable 
proving algorithm. This implies that it is not possible to prove a theorem stating that any proving 
algorithm yields an extractor. However, we can prove meaningful reductions if we define the success 
probability of a proving algorithm to be the average success probability over the q possible keys 
that are consistent with the information given to the Prover. 
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Suppose V is a (deterministic) proving algorithm for a message M = {mi, . . . , nin) S (IFg)"- For 
each challenge V and for every key K = {a,B) S Possible(M, 5), define xi^^K) = 1 ii V returns 
an acceptable response for the key K given the challenge V, and define x(^)-f^) = li otherwise. 

Since there are 75 choices for the pair {V, K), we define the average success probability sucCavg('P) 
to be 

SUCCavg(P) = — . (10) 

Lemma 3.4. Suppose there are D challenges V for which V returns an authentic response, and 
hence there are C = j — D challenges for which V returns a response that is not authentic. Then 

SUCCavg(P) < 1 

Proof. If y is a challenge for which V returns an authentic response, then x(y, K) = 1 for every 
K. If y is a challenge for which V does not return an authentic response, then Lemma |3 . 2 1 implies 
that x(^i K) = 1 for at most one K. Therefore, 

E x{V,K)<C + qD = ^q-C{q-l). 

yer,A'gPossible(Af,S) 

The desired result now follows from (IIOD. □ 



Let's now turn our attention to the response code. Even though a response is an ordered pair 
(/U,r), it suffices to consider only the values of /i in the extraction process (this follows because ^, 
K and V uniquely determine r; see ([9])). So we will define the response vector for a message M to 
be r^' = {M -V -.V eV). Observe that this response vector is identical to the response vector in 
the Linear Combination Scheme. 

Lemma 3.5. Suppose that V is a proving algorithm for the message M in the Modified Shacham- 
Waters Scheme. Let r^^ = {M ■ V : V e F) and let R' be the j-tuple of responses computed by V . 

q-1 

Proof. Define C as in Lemma 13.41 Since a co-ordinate of r*^ differs from the corresponding co- 
ordinate of R' only when the response is non-authentic, it follows that dist(r^, R') < C. Equation 
pT]) implies that 



C < 



[I - SUCCavg(7^))79 



q-l 

from which the stated result follows. □ 



Theorem 3.6. Suppose that 

SUCCavg(7') > 1 - (12) 

where d* is given by equation (0). Then the Extractor presented in Figure [3 will always output 
fh = m. 
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Figure 8: Extractor for the Modified Shacham-Waters Scheme 



1. On input "P, compute the vector R' = {fiy ■ ^ G T), where {ij.v,tv) = 'PiV) for all 
V £T. 

2. Find M € M* so that dist(i?', r^-'^) is minimised. 

3. Output fh = e-^(M). 



Proof. Denote e = sucCavg('P), let R' be the 7-tuple of responses computed by V, and denote 
6 = dist(r*^, i?'), where M = e{m). We showed in Lemma 13.51 that 

q - 1 

We want to prove that M = M. We have that r*^ is a codeword in TZ* closest to R'. Since M 
is a codeword such that dist(r^'^, R') = 6, it must be the case that dist(r*^, R') < 6. By the triangle 
inequality, we get 

dist(r^^ r^) < dist(r*^ R') + dist(r^, R') < 5 + 6 = 26. 

However, 

q - I 



where the last inequality follows from (jl2p . Since r'^ and r^'^ are codewords within distance d* 
(which is the distance of the response code), it follows that M = M and the Extractor outputs 
m = e~^(M), as desired. □ 



4 Numerical Computations and Estimates 

We have provided sufficient conditions for extraction to succeed for several POR schemes, based on 
the success probability of the proving algorithm. Here look a bit more closely at these numerical 
conditions and provide some useful comparisons and estimates for the different schemes we have 
studied. 

We will consider three schemes. We have the following observations, which can be verified in a 
straightforward manner: 

1. For the Multi block Challenge Scheme, extraction will succeed if 

succ(P)>S„ = - + iA^. 

This is stated in Theorem 12. 4[ 
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2. For the Linear Combination Scheme (Version 2), extraction will succeed if 




succ(7') > 5i = (^-^) So + - 



( 1 \ 2 1 

SUCCavg(7') >S2=[ ) 5o + 2 • 



\ q J q q 



This follows from Theorem 13.61 using the estimate for d* given in ([5]). 

It is clear that 5o, Si and 52 are extremely close for any reasonable value of q (such as g > 2^^, for 
example). Therefore we will confine our subsequent analysis to and state our results in terms 
of the Multiblock Challenge Scheme. The formula for is relatively simple, but it is complicated 
somewhat by the binomial coefficients. Therefore, it may be useful to define an estimate that does 
not involve binomial coefficients. 

Theorem 4.1. Denote e = 1— succ(7'). Suppose that the following inequality holds in the Multiblock 
Challenge Scheme: 




(13) 



Then the Extractor will always succeed. 



Proof. From ()13p . we obtain 



ln(l -2e) > 



id 



n 



Now —X > ln(l — x) for < x < 1, we obtain 



ln(l - 2e) > ^In 




Exponentiating both sides of this inequality, we have 




It is easy to prove that 




so it follows that 




From this, we obtain 




and hence the Extractor will always succeed. 



□ 
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d 


succ('P) 


n (Thm. [231) 


n (Thm. Kl} 


e 


d 


succ('P) 


n (Thm. [231) 


n (Thm. Kl} 


10000 


10000 


0.6 


62143493 


62133493 


10000 


1000 


0.6 


6218850 


6213349 






0.7 


109145666 


109135666 






0.7 


10919066 


10913566 






0.8 


195771518 


195761518 






0.8 


19581651 


19576151 






0.9 


448152011 


448142011 






0.9 


44819701 


44814201 






0.99 


4949841645 


4949831645 






0.99 


494988664 


494983164 


1000 


10000 


0.6 


6218850 


6213349 


1000 


1000 


0.6 


622334 


6213349 






0.7 


10919066 


10913567 






0.7 


1092356 


10913567 






0.8 


19581651 


19576152 






0.8 


1958614 


19576152 






0.9 


44819700 


44814201 






0.9 


4482419 


4481420 






0.99 


494988664 


494983164 






0.99 


49499315 


49498316 


100 


10000 


0.6 


626398 


621334 


100 


1000 


0.6 


62684 


62133 






0.7 


1096413 


1091357 






0.7 


109685 


109135 






0.8 


1962669 


1957615 






0.8 


196311 


195761 






0.9 


4486471 


4481420 






0.9 


448691 


448142 






0.99 


49503366 


49498316 






0.99 


4950381 


4949831 


50 


10000 


0.6 


315719 


310667 


50 


1000 


0.6 


31594 


31068 






0.7 


550718 


5456783 






0.7 


55093 


545678 






0.8 


983840 


9788076 






0.8 


98406 


978807 






0.9 


2245736 


2240710 






0.9 


224599 


224071 






0.99 


24754183 


24749158 






0.99 


2475440 


2474916 


10000 


100 


0.6 


626398 


621334 


10000 


10 


0.6 


67272 


62133 






0.7 


1096413 


1091356 






0.7 


114216 


109136 






0.8 


1962668 


1957615 






0.8 


200808 


195761 






0.9 


4486471 


4481420 






0.9 


453165 


448142 






0.99 


4950336 


49498316 






0.99 


4954838 


4949832 


1000 


100 


0.6 


62684 


62133 


1000 


10 


0.6 


6731 


6213 






0.7 


109685 


109135 






0.7 


11425 


10914 






0.8 


196311 


195761 






0.8 


20084 


19576 






0.9 


448692 


448142 






0.9 


45320 


44814 






0.99 


4950381 


4949831 






0.99 


495488 


494983 


100 


100 


0.6 


6313 


6213 


100 


10 


0.6 


677 


621 






0.7 


11013 


10913 






0.7 


1146 


1091 






0.8 


19675 


19576 






0.8 


2012 


1958 






0.9 


44913 


44814 






0.9 


4536 


4481 






0.99 


495082 


494983 






0.99 


49552 


49498 


50 


100 


0.6 


3181 


3106 


50 


10 


0.6 


341 


311 






0.7 


5531 


5456 






0.7 


576 


546 






0.8 


9862 


9788 






0.8 


1009 


979 






0.9 


22481 


22407 






0.0 


2270 


2240 






0.99 


247565 


247492 






0.99 


24779 


24749 



Table 1: Values of n which the Extractor will always succeed. 
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Table [T] presents values of n (the length of an encoded message) , for different values of i (the 
hamming weight of the challenge), d (the distance of the code) and the success probability of the 
proving algorithm, such that the Extractor is guaranteed to succeed. We tabulate the value of n 
as specified by Theorems 12.41 and 14.11 We see, for a wide range of parameters, that the estimate 
obtained in Theorem 14. II is very close to the earlier value computed in Theorem I2.4[ 

5 Estimating the Success Probability of a Prover 

5.1 Hypothesis Testing 

The essential purpose of a POR scheme is to assure the user that their file is indeed being stored 
correctly, i.e., in such a manner that the user can recover the entire file if desired. We have 
considered several schemes for testing whether this is the case, but in order to use these schemes 
appropriately it is also necessary to pay attention to how we interpret the results of these tests. 
Theorem 12.11 tells us that extraction is possible for the Basic Scheme whenever succ(P) is at least 
(n — + l)/n, hence the information we would like to obtain from using the Basic Scheme is 
whether or not succ(P) exceeds the necessary threshold. Similarly, for the Multiblock Challenge 
Scheme or the Linear Combination Scheme, we can compute a value to such that extraction is 
possible whenever succ('P) > We can calculate succ('P) for a given proving algorithm V if 

we know the values of "P's response V{c) for every possible challenge c £ T. However, the whole 
purpose of a POR scheme is to provide reassurance that succ(P) is sufficiently large without having 
to request all 'P(c) for all c € T. Given the prover's responses to some subset of possible challenges, 
the user wishes to make a judgement as to whether he/she is satisfied that succ('P) is acceptably 
high. This takes us straight into the realm of classical statistical techniques such as hypothesis 
testing [U [5] . 

Suppose the prover has given responses 'P(c) to t challenges c chosen uniformly at random 
without replacement from F, and that g of these responses are found to be correct. We are concerned 
that the prover's success rate may not be high enough, so we are looking for evidence to convince 
us that in fact it is sufficiently high to permit extraction. In other words, we wish to distinguish 
the null hypothesis 



Suppose that Hq is true. Then the probability that the number of correct responses is at least g is 
itself at most 



If this probability is less than 0.05, then we reject Hq and instead accept the alternative hypothesis 
(namely that succ('P) is sufficiently high to permit extraction. In this case we conclude that 
the server is storing the file appropriately.) If the probability is greater than 0.05 then there is 
insufficient evidence to reject Hq at the 5% significance level (so we continue to suspect that the 
server is perhaps not storing the file adequately). 



Ho : succ(7?) < 

from the alternative hypothesis 
Hi : succ(P) > ^. 




(14) 
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Alternatively, if we choose the challenges uniformly at random with replacement, then the 
condition for rejecting the null hypothesis becomes 




t-i 



.V V 7 y V 7 



< 0.05. 



Example 1. Suppose that for the Basic Scheme n = 1000, and that the minimum distance of the 
response code is 400. Then by Theorem 12. II we find that extraction is possible whenever succ(7') is 
greater than 0.8. Suppose the prover responds to 100 challenges that have been chosen uniformly 
with replacement, and that 87 of the responses were correct. We find that 



Thus, in this case there is sufhcient evidence to reject the null hypothesis at the 5% significance 
level, and so we conclude that the file is in fact being stored correctly. 

On the other hand, if only 86 of the responses were correct, we observe that 



In this case there is not enough evidence to reject the null hypothesis at the 5% significance level, 
and so we continue to suspect that the server is not storing the file adequately. 

The benefit of this statistical approach is that given the observed responses to the challenges, for 
any desired value of a we can construct a hypothesis test for which the probability of inappropriately 
rejecting the null hypothesis (and hence failing to catch a prover that does not permit extraction) 
is necessarily less than aJl This is the case regardless of the true value of succ('P), and we do not 
need to make any a priori assumptions about this value. 

In Table [2] we give examples of a range of possible results of the challenge process and the 
corresponding outcomes in terms of whether the null hypothesis is rejected at either the 5% or 1% 
significance level. 

5.2 Confidence Intervals 

Another closely-related way to portray the information provided by the sample of responses to 
challenges is through the use of confidence intervals. We define a 95% lower confidence bound 9^ 



This represents the largest possible value for succ(P) for which the probability of obtaining g or 
more correct responses in a sample of size t is less than 0.05. Then the decision process for the 
hypothesis test described in Section [5TT] consists of rejecting the null hypothesis whenever < 9l, 

^Here we refer to probability over the set of all possible choices of t challenges. 





by 
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since if ^^^^ is less than the critical value we know that the probability of a prover with success 
rate at most providing g or more correct responses is less than 0.05. 

The interval {0l, 1] is a 95% confidence interval for succ('P): if a large number of samples of 
size t were made and the corresponding intervals were calculated using this approach, then you 
would expect the resulting intervals to contain the true value of succ('P) at least 95% of the time. 
The hypothesis test can be expressed in terms of the confidence interval (9^, 1] by stating that we 
reject Hq whenever does not lie in this interval. 

Example 2. Suppose we have n = 1000 and d = 400 as in Example [H and suppose that 90 of the 
responses are correct. Then 

- ey-' < 0.05 I 




Then a 95% confidence interval for succ('P) is (0.836, 1], and hence we reject the null hypothesis, 
as ^^^TT^ = 0.8 does not lie in this interval. 



1^-1 

7 


t 


9 


a = 0.05 


a = 0.01 


uj-L 
7 


t 


9 


a = 0.05 


a = 0.01 


0.8 


100 


100 


/ 


/ 


0.9 


100 


100 


/ 


/ 


0.8 


100 


95 


/ 


/ 


0.9 


100 


95 


X 


X 


0.8 


100 


90 


X 


X 


0.9 


100 


90 


X 


X 


0.8 


100 


85 


X 


X 


0.9 


100 


85 


X 


X 


0.8 


100 


80 


X 


X 


0.9 


100 


80 


X 


X 


0.8 


200 


180 


/ 


/ 


0.9 


200 


200 


/ 


/ 


0.8 


200 


175 


/ 


/ 


0.9 


200 


195 


/ 


/ 


0.8 


200 


170 


/ 


X 


0.9 


200 


190 


/ 


/ 


0.8 


200 


165 


X 


X 


0.9 


200 


185 


X 


X 


0.8 


200 


160 


X 


X 


0.9 


200 


180 


X 


X 


0.8 


500 


435 


/ 


/ 


0.9 


500 


480 


/ 


/ 


0.8 


500 


430 


/ 


/ 


0.9 


500 


475 


/ 


/ 


0.8 


500 


425 


/ 


/ 


0.9 


500 


470 


/ 


/ 


0.8 


500 


420 


/ 


X 


0.9 


500 


465 


/ 


X 


0.8 


500 


415 


X 


X 


0.9 


500 


460 


X 


X 



Table 2: Outcomes of hypothesis testing for a range of responses. The columns headed by values 
of a contain a tick if Hq is rejected at the corresponding significance level, and a cross otherwise. 



5.3 Reacting to a Suspect Prover 

One question that has not always been directly considered is what action to take when a prover is 
suspected of cheating. In the framework of Section [5. II this becomes the problem of what to do in 
the case where there is insufficient evidence to reject the null hypothesis. There are various possible 
options at this point, and the choice of option will depend on factors such as the reason for storing 
the file, and any costs and inconvenience that might be associated with associated with ceasing to 
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use that server, or with switching to another storage provider. For example, if a server is simply 
being used as a backup service for non-critical data and there is a high overhead associated with 
switching storage providers, then a user will not want to be overhasty in taking action against a 
possibly innocent server. In this case an appropriate action in the first instance might be to seek 
more responses to challenges in order to avoid the possibility that the earlier set of responses were 
unrepresentative of the reliability of the prover in general. 

5.4 Comparison with Approaches Followed in the Literature 

Ateniese et al. [1] observe that if succ('P) < ^ (with (7 € Z) then when V is queried on t possible 
challenges chosen uniformly at random without replacement then the probability Prbad that at least 
one incorrect response is observed is given by 



As examples of parameters, they point out that if succ('P) = 0.99 then to achieve Prbad = 0.95 
requires t = 300, and Prbad = 0.99 requires t = 460. They comment that the required number t 
is in fact independent of 7, sinced it is based instead on the required threshold for succ('P) -this 
observation applies equally to our analysis. 

Dodis, Vadhan and Wichs [6] use a similar approach to Ateniese et al., but in addition propose 
the use of a hitting sampler that amounts to choosing which t elements to sample from a specified 
distribution that contains fewer than (^) possible sample sets but still guarantees that Prbad is 
higher than some specified value for a given value of succ('P) that is less than 1. 

This analysis can be interpreted in the context of a hypothesis test to distinguish the null 
hypothesis 

Hq : succ(P) < 0.99; 

from the alternative hypothesis 

Hi : succ(P) > 0.99. 

If 300 challenges are made and all the responses are correct, then a 95% confidence interval for 
succ(7-') is (0.99006, 1], so there is enough evidence to reject the null hypothesis at the 5% significance 
level. However, a 99% confidence interval for succ(P) is (0.977,1], so there is insufficient evidence 
to reject the null hypothesis at the 1% significance level. If, on the other hand, 460 challenges were 
made and all the responses were correct then a 99% confidence interval for succ('P) is (0.99003, 1] 
and so in this case there is enough evidence to reject the null hypothesis at the 1% significance 
level. 

We note that this is a special case of the analysis in Sections 15.11 and 15. 2i Specifically, Ateniese 
et al. are focusing on determining the smallest number of challenges for which an entirely correct 
response constitutes sufficient evidence to reject the null hypothesis at the desired significance level. 



Prbad = 1 — -7:^ 



and they note that 
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While this does result in the smallest number of challenges for which there is still the potential 
to reassure the user as to the appropriate behaviour of the prover, it has the drawback that even 
a single incorrect response results in failure to reject the null hypothesis, regardless of whether it 
is true. Taking a larger sample size has the benefit of increasing the probability that a false null 
hypothesis is rejected, without adversely affecting the probability that a true Hq fails to be rejected. 
For example, from Table [2] we see that if 90 correct responses out of 100 are observed then there is 
insufficient evidence to reject the hypothesis succ(P) < 0.8 at the 5% significance level. However, 
if 180 correct responses out of 200 are observed then there is sufficient evidence to reject this null 
hypothesis at the 5% significance level (in fact we even have enough evidence to reject Hq at the 
1% significance level). 

6 A Lower Bound on Storage and Communication Requirements 

In this section, we prove a bound that applies to keyed POR schemes. Suppose that M is a random 
variable corresponding to a randomly chosen unencoded message m. Let V be a random variable 
denoting the information stored by the Verifier (i.e., the key), and let R be a random variable 
corresponding to the computations performed by an extractor. It is obvious that the probability 
that m can correctly be reconstructed is 2~^(^l^'^). Now, from basic entropy inequalities, we 
have 

i/(M|V,R) = H{M,Y,R) - H{Y,R) 

> H{M,Y,R) - H(V) - H(R) 

> H{M) - H{V) - H{R). 

Suppose that the message can be reconstructed by the extractor with probability 1. Then we have 
i?(M|V,R) = 0. The inequality proven above imples that 

H{M) < H(V) + H{K). (15) 

Now suppose that the extractor is a black-box extractor. In this situation, we have that 

i7(R) <7log2|A|, (16) 

since there are 7 possible challenges and each response is from the set A. The message m is a 
random vector in (F^)^, so 

H{M) = k\og^q. (17) 
Therefore, combining (jlSp . (1160 and ()17p . we have the following result. 

Theorem 6.1. Suppose we have a keyed POR scheme where the message is a random vector in 
(Fg)'', there are 7 possible challenges and each response is from the set A. Suppose that a black-box 
extractor succeeds with probability equal to 1. The the entropy of the verifier's storage, denoted 
H(V), satisfies the inequality 

H{V) > fclog2g-7log2|A|. 

We mentioned previously that Naor and Rothblum [11] proved a lower bound for a weaker form 
of POR-type protocol, termed an "authenticator" . As noted in [6], the Naor-Rothblum bound also 
applies to POR schemes. Phrased in terms of entropy, their bound states that 

H{M) < H(V) X H(R), 
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which is a weaker bound than (jlSp . 

In the case of an unkeyed scheme, the extractor is only given access to the proving algorithm. 
Therefore, i?(M|R) = if a black-box extractor succeeds with probability equal to 1. From this, 
it follows that H(M.) > H(IL) in this situation. 

7 Conclusion 

We have performed a comprehensive analysis of the extraction properties of unconditionally secure 
POR schemes, and established a methodology that is applicable to the analysis of further new 
schemes. What constitutes "good" parameters for such a scheme depends on the precise application, 
but our framework allows a flexible trade-off between parameters. One direction of possible future 
interest would be to consider the construction of further keyed POR schemes with a view to reducing 
the user's key storage requirements. 
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Appendix 



Table 3: Notation used in this paper 



q 


order of underlying finite field 


m 


message 


rrii 


message block 




message space 


ft 


length of a message 


M 


eiicotieu. iiieboage 


M* 


encoded message space 


n 


length of an encoded message 


d 


distance of the encoded message space 


c 


challenge 


V 


challenge space 


7 


number of possible challenges 


r 


response 


P 


response function 




response vector for encoded message M 


A 


response space 


n* 


response code 


d* 


distance of the response code 


V 


proving algorithm 


succ(P) 


success probability of proving algorithm 


fh 


message outputted by the Extractor 


K 


key (in a keyed scheme) 


S 


tag (in a keyed scheme) 


dist 


hamming distance between two vectors 
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